NEIGHBORLINESS OF MARGINAL POLYTOPES 
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ABSTRACT. A neighborliness property of marginal polytopes of hierarchical models, de- 
pending on the cardinality of the smallest non-face of the underlying simplicial complex, 
is shown. The case of binary variables is studied explicitly, then the general case is re- 
duced to the binary case. A Markov basis for binary hierarchical models whose simplicial 
complexes is the complement of an interval is given. 



1. Introduction 

The marginal polytope is an interesting combinatorial object that appears in statistics! 8], 
coding theory lfT31 151 [Toll and, under a different name, in toric algebra (6). It encodes in its 
face lattice the complete combinatorial information about the boundary of certain statistical 
models. To define it we have to take a very brief excursion to statistics, namely the theory 
of hierarchical models for contingency tables. 

Consider a collection of n random variables taking values in finite sets = l,...,n. 
We denote N := {1, . . . ,«}, and its power set as 2 N := {B : B C N}. For a subset BCiVof 
the variables, we denote its set of values as %b : = x ieB and abbreviate 3£ := We 
have the natural projections 

Xfi ' X — ► Xb 

(1) ( \ t \ 

{Xi)ieN i-> (Xi)ieB =:xb- 

We slightly abuse notation and denote xb the projection of x, which is a function of x, 
and by the same symbol an arbitrary element xb E Xb- A contingency table is a function 
u : X — > No- It is thereby a vector in the space Nq . For BCWwe define the marginal 
table ub E Nq as tne vector with components 

(2) u B {x B ) := "M- 

y:Xs(y)=-Ts 

A so called hierarchical model for contingency tables can be given by a simplicial complex 
A on the set of variable indexes 18, J). The facets & of A are defined as the inclusion 
maximal faces. They determine the marginal map: 

7T A : M' r -> R Xf 

(3) Fe<? 

It is a linear map computing all marginal tables corresponding to facets. We define cylinder 
sets denoting for B C N, and ys E 3£b 

(4) {X B = y B } := {* G X : X B {x) = y B } . 

With respect to the canonical basis, the matrix representing is the d X \SK \ matrix 

'\ ifX B {x)=y B 



(5) A A := (A {B , yB ),x)(B,y B ),x where A {B ,y B ), x ■= 



otherwise. 



The rows of this matrix are indexed by pairs (B,ys), where B E & is a facet of A and 
Vb E 3£b is a configuration on B. Then d is defined as the number of such pairs. If the 
simplicial complex is clear, we will sometimes omit the index A. 
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Definition 1 (Marginal Polytope). The marginal polytope is the convex hull of the columns 
of A A : 

(6) Q A : = conv {A x : x e CR d . 

Example 2 (Two independent binary variables). In the case of two binary variables, we 
have 5C = {(00), (01), (10), (11)}. Let A = {{1}, {2}}, then the matrix A A is given as 



(7) A A = 



/l 1 0\ 

11 

10 10 

\0 1 I J 



The columns are ordered as ({1} ,0), ({1} , 1), ({2} ,0), ({2} , 1). If A was the whole power 
set, A a would be the 4x4 identity matrix. The marginal polytopes are easily identified as 
a 2-dimensional square and a 3-dimensional simplex respectively. 

Our object of interest is the toric ideal: 

(8) y A := (jf-p y :u,ve Nf,n A (u) = jt a (v)) . 

Here, we used the standard notation for monomials in the variables p x ,x E 3£ , namely 

P" '■ = ITteir px ■ Throughout the whole paper we use the convention that 0° = 1. The 
set of indexes with non-vanishing exponent will be called the support of the binomial 
suppQ/' — p v ) := {i e 1" : u(x) + v(x) > 0}. The supports of u and v will also be called 
the positive respectively negative support of the binomial. The ideal J^a is a homoge- 
neous prime ideal in the polynomial ring C[p x : x £ 2£\. In statistics the restriction of the 
corresponding variety to the non-negative real cone, would be called the closure of an expo- 
nential family . This seminal observation is the cornerstone of what is now called algebraic 
statistics I4ll71[l2l. 

A first task is to find a suitable finite generating set of this ideal. Very useful is a Markov 
basis defined as follows: 

Definition 3. A finite set M C ker^ 7t A is called a Markov basis for the hierarchical model 
A if for each two contingency tables u,v £ N|f with equal marginals K A {u) = 7Z A (v) there 
exists a sequence m,-, i = 1 in ±M such that 

/ 

(9) M = v + £m,-, 

i=i 

where 

i: 

(10) v+ £ rrn e N(f' for all k = 1, . . . ,/. 

The crucial property of a Markov basis is that any two tables, having the same marginals, 
can be connected without leaving the non-negative cone. A key theorem is, that exactly a 
Markov basis gives the desired set of generators: 

Theorem 4 (H). A finite set M is a Markov basis if and only if 

(11) J? A = (p m+ -P m ~ :»6tf), 

where m + (x) : = max {0, m{x) }, m~ (x) : = max {0, — m(x) }, such that m = m + — m~. 

The elements in a Markov basis are referred to as Markov moves. In the following 
section we will give a lower bound on the cardinality of the positive and negative support 
of any move. 
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2. A LOWER DEGREE BOUND 



Theorem 5. Let Abe a simplicial complex on N and JP^ the corresponding toric ideal. Let 
g be the minimal cardinality of a non-face of A. Each generator of has degree at least 
2 s ~ l . Moreover, the positive and negative supports of each generator both have cardinality 
bigger or equal to 2 8 ~ l . The degree bound is realized only by square free binomials. 

Remark. Note that we give a lower bound on the -smallest- degree among the generators. 
Lower bound on the largest degree have been considered for a measure of complexity of 
the model for instance in Q. There, it is shown that one finds a simplicial complex on 2n 
units, such that there exists a generator of degree 2". Furthermore, in f3) the authors study 
an algorithm which, for graph models, computes all generators of a given degree. Finally, 
in ifTTl the case of 2-margins of (r,s,3)-tables is studied. It is shown that as r and s grow 
the support and degree of a maximal generator cannot be bounded. This has interesting 
implications for data disclosure. 

Remark (Graph models). A graph model is a hierarchical model for which dimA < 1 holds. 
If its graph is not complete, the bound reduces to the trivial bound degra > 2. On the other 
hand, for the complete graph, there are no quadratic generators. 

Remark (Type of generators). The vectors that achieve the bound (see Lemma|7|i are natural 
generalizations of the quadratic Markov moves for the independence model |4|. 

We will prove Theorem|5]in two steps. First, the binary case is studied explicitly. Then 
the general case is reduced to the binary case. 

2.1. The binary case. In this section we have S£ = {0, 1}^. This will allow us to use a 
special orthogonal basis of ker^ Using this, we find that any element in the kernel has 
a lower bound for the cardinality of its support. 

Put A' ' := 2 N \ A the set of non-faces of A. For elements G € A c we define the upper 
intervals 



which are contained in A'. Next, for each SCNwe define a vector eg £ R with compo- 
nents: 



where E(B,x) := \{i £ B : x, = 1}| is the number of entries equal to one that x has in B. 
Observe, that eg depends on its argument only through jtg, the part in B. Therefore we will 
sometimes abuse notation and write <?b(xb) for the value of eg at any configuration which 
projects to xb- We have 

Lemma 6 (J9)). The set {eg : B C N} is an orthogonal basis ofR'^' such that {es '■ B G A'} 
is a basis ofkeizA&. 

Remark (Characters). If we treat ^ as the additive group (Z/2Z)" then the characters 
of this group form an orthonormal basis (with respect to the product induced by the Haar 
measure, which in this case is proportional to the standard product) of C . The characters 
are exactly given by the vectors es,B CN. In our case the characters are real functions and 
also a basis of K r . See HHH for details. 

Lemma 7. Let G € A c and ^ := [G,N]. For g := \G\ it holds 



(12) 



[G,N] :={BCN:BDG} 



(13) 



e B {x) :=(-!) 



E{B,x) 




2 n -8e G {x G ) if XN \ G =(0,...,0) 



otherwise . 



Furthermore, for any xq £ S6q we have the identity 
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Proof. For the second case assume we have i GN\G such that Xi = 1 . Since half of the sets 
in [G,N] contain i, while the other half does not contain i, it follows that the sum equals 
zero if such an i exists. The first case is now clear. All the summands are equal to e G 
in this case, and there are exactly 2 n ~ g terms. The identity ( fl5l l follows by by the same 
argument. □ 

Remark. By choosing appropriate signs in the sum, one can achieve any of the cylinder 
sets {X N \ G = x N \ G } instead of {X N \ G = 0} as the support. To be concrete, we have 

m y ^ G {x):=l j {-l) E ^)e B {x) 

Be'S 

= h"Se G (x G ) ifx N \ G =y N \ G 
[ otherwise. 

The vectors we have just constructed have minimal support. In the following we will 
deduce a technical, but elementary statement about large subsets of In Lemma|9] it will 
follow that choosing G minimal in A c , the value 2" — 2N , as in Lemma [7] is the maximal 
number of zeros, which can be achieved by non-trivial linear combinations of the vectors 
e B ,BeA c . 

Lemma 8. Let g 6 {1, . . . ,n} be fixed. For <W C J" with \W\> 2" -2 g the following 
statement holds: 

• For each B <ZN with \B\ >g, W contains one of the cylinder sets {Xb = xb}. More 
formally: 3xb G 3^b such that {Xb = xb} Q & . 

Proof. The statement follows from a simple cardinality argument. Assume the contrary, 
let B be given, and Vxg <E 5£b, 3x G S£\'3f such that xb = Xb{x). These x are all distinct, 
since they differ on B. We find | <3f \ < 2" - 2«. □ 

Lemma 9. Let g denote the minimal cardinality among the sets in A e . Then any non-zero 
linear combination of the vectors eB,B £ A c has at least 2 g ~ l positive and 2 g_1 negative 
components. 

Proof. Assume we have a linear combination 

(17) m = £ z% eker^ A 

BeA c 

which has less then 2 A '~' positive components. It has at least 2" — 2 s ~ l + 1 non-positive 
components. Let ^< C 3£ denote the corresponding indexes. Let G G A £ have cardinality 
g and choose ; G G arbitrary. By Lemma[8]we find a cylinder set {Xg\{<} = yc\{i} } which 
is contained in We have 



(18) m(x) = z B e B {x) < i6f<. 



Summing up these equations over the cylinder set {^g\{/} = yc\{i} } yields 

(19) £ £z%(x)<0. 

■M x G\{,}=yG\{, } } BeAC 

Note that this summation is in fact the computation of the marginal '«g\{(} evaluated at the 
value y G \{,}. Since m £ ker^A, and G \ {/} G A, equality must hold in ( fl9l l. We find that 
every term in the sum was already zero: 

(20) £ z%(x) =0 x G {X GX{i} = y GW} } 
bea c 

We will now inductively show that m = 0. Contained in {X G \{,} =3 ; g\{i}} we have a 
smaller set {X G = y G }. Summing up the respective components of m for this set we find, 
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using Lemma[7] 

o= E E^w 

(21) xe{X G =y a }B&A c 

= z G 2«Se G (x G ) 

It follows that z G = 0. Applying the same argument, we can show that all coefficients z H 
vanish for \H\ = g. Inductively, we continue with sets of cardinality g+l. Finally, this 
argument yields that all coefficients vanish and m is zero. The whole procedure applies, 
mutatis mutandis, for the negative components as well. □ 

Lemma|9]completes the proof of Theorem|5]in the binary case. It shows that the degree 
of each Markov move is at least 2* _1 . Since in fact we have a lower bound for the support, 
the degree bound can only be realized by square free binomials. 

2.2. The non-binary case. We now study the non-binary case. Let 3£ = XieN^i be 
some arbitrary, finite configuration space. 

Definition 10. Let fa : ,% — ► {0, 1} ,i £ N be surjective maps. For each B C N, the com- 
posed maps 

(22) fr:^{0,l>* 

Xb >-> (fa(xi))ieB- 

are called collapsing maps. Abbreviating, put := fay. We have an induced map on con- 
tingency tables: 



4>:Nq -^N 



{0,1}* 



(23) 



ze{0,l} N 



The key property of such a collapsing is that it commutes with marginalization. 
Lemma 11. Let u e Nff*. ForB C N,zb 6 {0, 1} B it holds: 

(24) e E «w= E E «H- 

A'B e^j 1 (z B ) H ' e ( X B ) - v£ {^B =2fl } W6 - 1 (y) 

Note that for the cylinder set on the left hand side, {Xb = xb} Q 3E , while on the right 
hand side {X B = z B ] Q {0, 1}*. 

Proof. Since on each side, every w appears at most once, it suffices to show the set equality 

(25) u {xb=x b }= u {r'w}- 

x B e(j)B l {zB) y^{x B =z B } 

"C":: Let w from the left hand side be given. One has Xb(w) = xb for some xb with 
<I>b{xb) = zb- Therefore <j>(w) = y with Xsiy) = zb and w is contained in the right 
hand side. 

"D":: Let w = <p~ 1 (y),y G {Xb =Zb} from the right hand side be given. We have 
Xb(w) € <I)~ 1 (zb), so w is contained in the left hand side. 

□ 

Lemma 12. Let u,v 6 be contingency tables. Denote the marginal map in the non- 
binary model as 7T A , the corresponding binary one as p A . In this case, %\(u) = K\{v) 
implies p A (*(w)) = p A (*(v)). 
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Proof. Letfi £ A,z B £ {0, 1} . We have to show that 

(26) £ *(«)(y)= E *(v)Cy). 

By definition this equation is 

(27) £ E «W= E E Kw). 

ye{X B =z B } weQ~ l (y) ye{X B =z B } w£<j>- 1 (v) 

Using LemmafTTIand the hypothesis, the statement follows. □ 

Theorem\5\ Using the collapsing map, from generators of the non-binary model we can 
construct relations in the corresponding binary model as follows. Consider the polynomial 
rings d\ : = C[p x : x £ 3£\ and 13 := C[q z : z £ {0, 1}^]. Given a simplicial complex A, 
denote J^a Q 9t the non-binary toric ideal and C 13 the binary one. 

To each binomial p m+ — p m £ SK associate the collapsed binomial g*( m+ ) — q®( m ) £ Q. 
By Lemma [121 it is clear that elements in the toric ideal J^a are mapped to ^a- Further- 
more, the support of q^ m \ and q^" 1 ) respectively, will have smaller cardinality than the 
supports of p m and p" 1 . Finally, if the non-binary model had a generator violating the 
statement of the theorem, then we can choose the maps 0, : ; £ N in such a way that this 
generator gets mapped to a non-zero binomial which violates the statement for the binary 
case. This contradiction concludes the proof. □ 



3. Neighborliness 

Before stating the neighborliness property we will take another short excursion to statis- 
tics, introducing so called exponential families and their relation to marginal polytopes. 

Let again A denote a simplicial complex. For each x £ S£ we have A x the corresponding 
row of the marginal matrix Aa, as defined in (0. The exponential family associated to this 
complex is the parametrized family of probability measures 

(28) R'* D <?a ■= {pe(x) =Z(e)- 1 exp((0,A. v )) : 9 £ IT'} . 

Here, Z(9) := Lve,ar ex P((0iAv)) is a normalization, called the partition function. Like 
in Section [T] d is the number of rows of By construction an exponential family is an 
open subset of the simplex of all probability measures on SK ' . Typically one is interested in 
the closure <?a, which is taken with respect to the usual topology of W. The closure of 
equals the non negative part of the toric variety V ( J^a) Theorem 3.2]. By this fact, the 
Markov basis gives the implicit equations, cutting out the set 

We can now state our main result in two equivalent formulations: 

Theorem 13. Let g be the minimal cardinality among the non-faces of A. 

Geometric Formulation:: The marginal polytope is 2 g ~' — 1 neighborly. 
Probabilistic Formulation:: Every probability measure p with |supp(/?)| < 2 S ~' is 
contained in S^. 

Proof. The probabilistic formulation is easy to see. Just observe that by Theorem [2] each 
monomial appearing in the set of generators \p" l+ -p m :;7?£m} has cardinality of its 

support bounded from below by 2 8 ~ 1 . Therefore a p with | supp(p) | < 2 8 ~ 1 must fulfill the 
defining equations trivially. 

Now, the geometric formulation is due to the well known fact that a set & C is 
the support set of some p £ $a if and only if conv {A y : y £ W } is a face of the marginal 
polytope Qa- This is a consequence of the fact that the marginals computed by Aa form a 
sufficient statistics for the exponential family S^. □ 
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Remark (The bound is sharp). On first sight one would maybe expect a better neighbor- 
liness property in the non-binary cases, for instance if every variable is ternary. How- 
ever, one can easily see that the bound is sharp in the sense that already for the "no-three- 
way-interaction" model with ternary variables, given by N = {1,2,3}, 3% — {0, 1,2} for 
i = 1,2,3 and A = {fiC {1,2,3} : \B\ < 2}, one has square-free generators of degree 4. 
They can easily be computed with 4ti2 JTJ or looked up in the Markov Bases Database lfl4l . 
Then a p supported exactly on the positive support is a counterexample for any improve- 
ment of Theorem[T3l 

Remark (Maximizing Multiinformation). The so called Multiinformation is an entropic 
quantity which generalizes mutual information to more than two variables. Denoting 

H(p) := -LxeX P( x ) l °EP(x) the entropy of p, and Hi{p) := -Zxe.r, P{i}( x ) l °gP{i}( x ) 
the marginal entropy for i e N, it is defined as 

(29) MI{p) := ^H{(p) —H(p) 

ieN 

An interesting problem, considered in J2), is to maximize this function. There, all global 
maximizer in the binary case are classified giving there support sets. In particular, by (2 
Theorem 3.2] all global maximizer p* satisfy 

(30) |supp(p*)|=2. 

Let A? := {B C N : \B\ < 2} denote the uniform simplicial complex of order two, then it is 
shown 

Corollary (Theorem 3.5 in |2]). All global maximizer of MI are contained in S^ T 

In view of the bound on the cardinality of the support, this now also follows from our 
Theorem[T3l 



4. Markov Bases of high dimensional models 



Finally, in this last section, we will show an example where the moves m^ already 
constitute the full Markov Basis. Consider again the binary case 3£ = {0, 1}". Let G CW. 
We denote 



(31) 



{B C N : B 2 G} ; 



the complex of all sets not containing G. We have seen that the toric ideal for this complex 
is generated in degree at least 2' G I~ 1 . In this section we show, that if A has the structure PIT ) 
and the variables are binary, the Markov basis is given by the moves m]^ a as defined in ( fT6b . 
and therefore 7a/ is generated in exactly degree 2l G l~'. As the no-three-way interaction 
model is of the form OTb it is also clear the statement of the following theorem does not 
hold as soon as the variables are not binary. 

Theorem 14. Let = [G,N]. A Markov basis of the binary hierarchical model given by 
A/c is 

(32) M:={^ c :^ G e%}. 



Proof. We apply the standard technique (3) of reducing the degree of a given binomial via 
the moves in M. For convenience we introduce tableau notation[8 1 for monomials. In this 
notation, the monomial p" is represented by listing each x £ J?T, u(x) times. For example 
/?ooo/?iio/?ni will be written as the tableau 

"000" 
110 
111 
111 



(33) 
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Assume p u — p Y e ^Ai G - Without loss of generality we assume that G = {I, ... ,«}. We 
can assume that u and v have disjoint supports, otherwise we write p" — p v = q(p" — p v> ) 
and the following argument shows that p" — p v can be expressed in terms of the Markov 
basis. Consider first the case m(00. . .0) > 1. Since the marginals on the n — 1 sets 

(34) {l,2,...,n-l},{l,2,... ! n-2,n},. ..,{1,2,. .., 1-1,1 + 1,. ..,n} 

of u and v coincide, and v(00 ... 0) = we find that the given binomial has the form 

"00... 100. ..0' 
00. . .0010. ..0 



00... 0000... 1 



where the set G is underlined. Applying the same argument in the other direction, namely 
that, since u has the same n — 1 marginals on the sets (134-b we find that u(x) > for any x 
which has exactly two non-zero positions, both lying in G, formally u(x) > for any x with 
supp(x) C G and supp(x) | = 2. We continue to find that v(x) > for any x with supp(x) C G 
and |supp(x) | = 3. Repeating this argument we find that p" contains all configurations with 
zero outside G and an even number of ones in G. Conversely, p v contains all configurations 
that are zero outside G and have an odd number of ones in G. All together, this is exactly 
the move nty . Obviously, in the general case, if in the beginning we would have started 
with some other configuration instead of 00 ... 0, say y, the same argument leads to the 
move m^ G instead. Abbreviating the specific move as m now, we write p" = Kp m+ and 
p v = Lp m with some monomials K,L and have 

p" - p v = Kp m+ - Lp" r + Kp" r - Kp m ~ 

(36) 

= K(p m -p m )-{L-K)p m . 

The degree of L — K is obviously smaller than the degree of p" — p v . Inductively it follows 
that p" — p v can be written as combination of the moves m^ a . □ 
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